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A definition of entropy via the Kolmogorov algorithmic complexity is discussed. As examples, we 
show how the meanfield theory for the Ising model, and the entropy of a perfect gas can be recovered. 
The connection with computations are pointed out, by paraphrasing the laws of thermodynamics 
for computers. Also discussed is an approach that may be adopted to develop statistical mechanics 
using the algorithmic point of view. 



I. INTRODUCTION 

The purpose of this lecture note is to illustrate a route 
for the definition of entropy using our experience with 
computers. In the process the connection between sta- 
tistical physics and computations comes to the fore. 



by the angular brackets, but no such average for S. As a 
result, one cannot talk of "free energy" of a configuration 
at any stage of the simulation. All the definitions men- 
tioned above associate S to the ensemble, or distributions 
over the phase space. They simply forbid the question 
"what is the entropy of a configuration" . Too bad! 



A. What is entropy? 



B. On computers 



This is a question that plagues almost all especially 
beginning physics students. There are several correct 
ways to answer this. 

1. It is the perfect differential that one gets by divid- 
ing the heat transfered by a quantity T that gives 
us the hot-cold feeling (i.e. temperature), (ther- 
modynamics) 

2. It is the log of the number of states available. 
(Boltzmann) 

3. It is something proportional to — ^pihipi where 
Pi is the probability that the system is in state i. 
(Gibbs) 

4. It is just an axiom that there exists an extensive 
quantity S, obeying certain plausible conditions, 
from which the usual thermodynamic rules can be 
obtained. (Callen) 

But the colloquial link between disorder or random- 
ness and entropy remains unexpressed though, agreeably, 
making a formal connection is not easy. Our plan is to 
establish this missing link a la Kolmogorov. 

Besides these conceptual questions, there is a practi- 
cal issue that bugs many who do computer simulations 
where different configurations are generated by some set 
of rules. In the end one wants to calculate various ther- 
modynamic quantities which involve both energy and en- 
tropy. Now, each configuration generated during a sim- 
ulation or time evolution has an energy associated with 
it. But does it have an entropy? The answer is of course 
blowing in the wind. All thermodynamic behaviours ul- 
timately come from a free energy, say, F = (E) — TS 
where E, the energy, generally known from mechanical 
ideas like the Hamiltonian, enters as an average, denoted 



Over the years we have seen the size of computers 
shrinking, speed increasing and power requirement going 
down. Centuries ago a question that tickled scientists 
was the possibility of converting heat to work or finding 
a perfect engine going in a cycle that would completely 
convert heat to work. A current version of the same 
problem would be: Can we have a computer that does 
computations but at the end does not require any energy. 
Or, we take a computer, draw power from a rechargeable 
battery to do the computation, then do the reverse op- 
erations and give back the energy to the battery. Such 
a computer is in principle a perpetual computer. Is it 
possible? 

What we mean by a computer is a machine or an ob- 
ject that implements a set of instructions without any 
intelligence. It executes whatever it has been instructed 
to do without any decision making at any point. At the 
outset, without loss of generality, we choose binary (0,1) 
as the alphabet to be used, each letter to be called a bit. 
The job of the computer is to manipulate a given string 
as per instructions. Just as in physics, where we are in- 
terested in the thermodynamic limit of infinitely large 
number of particles, volumes etc, we would be interested 
in infinitely long strings. The question therefore is "can 
bit manipulations be done without cost of energy?" 



II. RANDOMNESS 

The problem that a configuration can not have an en- 
tropy has its origin in the standard statistical problem 
that a given outcome of an experiment cannot be tested 
for randomness. E.g., one number generated by a random 
number generator cannot be tested for randomness. 

For concreteness, let us consider a general model sys- 
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tcm of a magnet consisting of spins = ±1 arranged 
on a square lattice with i representing a lattice site. If 
necessary, we may also use an energy (or Hamiltonian) 
E = —J^2 <i j > SiSj where the sum is over nearest neigh- 
bours (i.e. bonds of the lattice). Suppose the tempera- 
ture is so high that each spin can be in anyone of the two 
states ±1 with equal probability. We may generate such 
a configuration by repeated tossing of a fair coin. If we 

get + - - + + + - + - + + (+:H,-:T) 

is it a random configuration? Or Can the configurations 
of spins as shown in Fig. ^be considered random? 
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FIG. 1: Ising magnet. Spins ±1 are represented by arrows 
pointing up or down. (A) A ferromagnetic state, (B) an anti- 
ferromagnetic state, and (C) a seemingly random configura- 
tion. 

With TV spins (or bits), under tossing of a fair coin, 
the probability of getting Fig. ^A) is 2~ N and so is 
the probability of (B) or (C). Therefore, the fact that 
a process is random cannot be used to guarantee ran- 
domness of the sequence of outcomes. Still, we do have 
a naive feeling. All Heads in N coin toss experiments 
or strings like 1111111... (ferro state of Fig. UfA)) or 
10101010... (anti-fcrro state of Fig^B)) are never con- 
sidered random because one can identify a pattern, but 
a string like 110110011100011010001001... (or configura- 
tion of Fig^C)) may be taken as random. But what is 
it that gives us this feeling? 



A. Algorithmic approach 

The naive expectation can be quantified by a different 
type of arguments, not generally emphasized in physics. 
Suppose I want to describe the string by a computer pro- 
gramme; or rather by an algorithm. Of course there 
is no unique "programming" language nor there is "a" 
computer - but these are not very serious issues. We 
may choose, arbitrarily, one language and one computer 
and transform all other languages to this language (by 
adding " translators" ) and always choose one particular 
computer. The two strings, the ferro and the anti-ferro 
states, can then be obtained as outputs of two very small 
programmes, 

(A) Print 1 5 million times (ferro state) 

(B) Print 10 2.5 million times (antiferro state) 

In contrast, the third string would come from 

(C) Print 110110011100... (disordered state) 



so that the size of the programme is same as the size 
of the string itself. This example shows that the size of 
the programme gives an expression to the naive feeling 
of randomness we have. We may then adopt it for a 
quantitative measure of randomness. 

Definition : Let us define randomness of a 
string as the size of the minimal programme 
that generates the string. 

The crucial word is "minimal" . In computer parlance 
what we are trying to achieve is a compression of the 
string and the minimal programme is the best compres- 
sion that can be achieved. 

Another name given to what we called "randomness" 
is complexity, and this particular measure is called Kol- 
mogorov algorithmic complexity. The same quantity, 
randomness, is also called information, because the more 
we can compress a string the less is the information con- 
tent. Information and randomness are then two sides of 
the same coin: the former expressing a positive aspect 
while the 2nd a negative one! 

Let K (c) be a programme for the string of configura- 
tion c and let us denote the length of any string by | ... |. 
The randomness or complexity is 

S(c) = min|X(c)|. (1) 

We now define a string as random, if its randomness or 
complexity is similar to the length of the string, or, to be 
quantitative, if randomness is larger than a prc-choscn 
threshold, e.g, say, S(c) > |c| — 13. The choice of 13 is 
surely arbitrary here and any number would do. 



1. Comments 

A few things need to be mentioned here, (i) By defi- 
nition, a minimal programme is random, because its size 
cannot be reduced further, (ii) It is possible to prove 
that a string is not random by explicitly constructing a 
small programme, but it is not possible to prove that 
a string is random. This is related to Godcl's incom- 
pleteness theorem. For example, the digits of 7r may 
look random (and believed to be so) until one realizes 
that these can be obtained from an efficient routine for, 
say, tan -1 . We may not have a well-defined way of con- 
structing minimal algorithms, but we agree that such an 
algorithm exists. (Hi) The arbitrariness in the choice of 
language leads to some indefinitcness in the definition 
of randomness which can be cured by agreeing to add 
a translator programme to all other programmes. This 
still leaves the differences of randomness of two strings 
to be the same. In other words, randomness is defined 
upto an arbitrary additive constant. Entropy in classical 
thermodynamics also has that arbitrariness, (iv) Such a 
definition of randomness satisfies a type of subadditivity 
condition 5(ci + c 2 ) < S(c{) + Sfa) + O(l), where the 
0(1) term cannot be ignored. 
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B. Entropy 

Accepting that this Kolmogorovian approach to ran- 
domness makes sense and since we connect randomness 
in a physical system with entropy, let us associate this 
randomness 5(c) with the entropy of that string or con- 
figuration c. For an ensemble of strings or configurations 
with probability pi for the i-th string or configuration Cj, 
the average entropy will be defined by 

S K = Y / ^S(^) (2) 

i 

(taking the Boltzmann constant fcs = 1)- We shall claim 
that this is the thermodynamic entropy we arc familiar 
with. 

Since the definition of entropy in Eq. (J5J) looks ad hoc, 
let us first show that this definition gives us back the 
results we are familiar with. To complete the story, we 
then establish the equivalence with the Gibbs definition 
of entropy. 

C. Example I: Mean filed theory for the Ising 
model 

Consider the Ising problem. Let us try to write the 
free energy of a state with n+ + spins and n_ — spins 
with n + + n_ = N . The number of such configurations 
is 



An ordered list (say lexicographical) of all of these fl con- 
figurations is then made. If all of these states are equally 
likely to occur then one may specify a state by a string 
that identifies its location in the list of configurations. 
The size of the programme is then the number of bits 
required to store numbers of the order of f2. Let S be 
the number of bits required. For general TV, n+,7i_, S is 
given by 

2 s = Q =^> S = log 2 fl. (4) 

Stirling's approximation then gives 

S = n + log 2 n + + ri- log 2 n_ 

= N [plog 2 p+(l-p)log 2 (l-p)], (5) 

with p = n + /N, the probability of a spin being up. Re- 
semblance of Eq. I0J with the Boltzmann formula for 
entropy (Sec. should not go unnoticed here. Eq. (JSJ) 
is the celebrated formula that goes under the name of 
entropy of mixing for alloys, solutions etc. 

1. Comments 

It is important to note that no attempt has been made 
for "minimalizations" of the algorithm or in other words 



we have not attempted to compress fl. For example, 
no matter what the various strings are, all of the N spin 
configurations can be generated by a loop (algorithm rep- 
resented schematically) 

i = 
10 i = i+1 

L = length of i in binary 

Print (N-L) times, then "i" in binary 

If ( i < N ) go to 10 

stop 

By a suitable choice of N (e.g., N = 11 1) the code 

for representation of N can be shortened enormously by 
compressing N. This shows that one may generate all the 
spin configurations by a small programme though there 
are several configurations that would require individually 
much bigger programmes. This should not be considered 
a contradiction because it produces much more than we 
want. It is fair to put a restriction that the programmes 
we want should be self delimiting (meaning it should stop 
without intervention) and should produce just what we 
want, preferably no extra output. Such a restriction then 
automatically excludes the above loop. 

Secondly, many of the numbers in the sequence from 
1 to Q can be compressed enormously However, what 
enumeration scheme we use, cannot be crucial for phys- 
ical properties of a magnet, and therefore, we do need 
S bits to convey an arbitrary configuration. It is also 
reassuring to realize that there are random (i.e. incom- 
pressible) strings in 2 N possible iV-bit strings. The proof 
goes as follows. If an iV-bit string is compressible, then 
the compressed length would be < N — 1. But there are 
only 2 Ar_1 such strings. Now the compression procedure 
has to be one to one (unique) or otherwise decompression 
will not be possible. Hence, for every N, there are strings 
which are not compressible and therefore random. 

A related question is the time required to run a pro- 
gramme. What wc have defined so far is the "space" 
requirement. It is also possible to define a "time com- 
plexity" defined by the time required to get the output. 
In this note we avoid this issue of time altogether. 

2. Free energy 

In the Kolmogorov approach we can now write the free 
energy of any configuration, Cj as F$ = Ei— TSi with the 
thermodynamic free energy coming from the average over 
all configurations, 

F=(F} = (E)-T(S}. 

If we now claim that S obtained in Eq. JSJ is the en- 
tropy of any configuration, and since no compression is 
used, it is the same for all (this is obviously an approxi- 
mation), we may use (S) = S. The average energy may 
be approximated by assuming random mixture of up and 
down spins with an average value (s) = p— (1 —p). If q is 
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the number of nearest neighbours (4 for a square lattice), 
the free energy is then given by 

«|j(2p-l) 2 - T[pIogp+(l-p)log(l-p)]. (6) 

Note that we have not used the Boltzmann or the Gibbs 
formula for entropy. By using the Kolmogorov definition 
what we get back is the mean field (or Bragg- Williams) 
approximation for the Ising model. As is well-known, this 
equation on minimization of F with respect to p, gives 
us the Curie- Weiss law for magnetic susceptibility at the 
ferro-magnetic transition. No need to go into details of 
that because the purpose of this exercise is to show that 
the Kolmogorov approach works. 



D. Example II: Perfect gas 

A more elementary example is the Sackur- Tetrode for- 
mula for entropy of a perfect gas. We use cells of small 
sizes Ay such that each cell may contain at most one par- 
ticle. For N particles we need H = (V/AV) numbers to 
specify a configuration, because each particle can be in 
one of V/AV cells. The size in bits is S = iVlog 2 -^y so 
that the change in randomness or entropy as the volume 
is changed from Vi to Vj is 



AS = N\0g 2 ^r. 



(7) 



The indistinguishability factor can also be taken into ac- 
count in the above argument, but since it does not affect 
Eq. |J7J). we do not go into that. Similarly momentum 
contribution can also be considered. 



FIG. 2: Perfect gas: space divided into cells. The cells are 
occupied by the particles 

It may be noted here that the work done in isothermal 
expansion of a perfect gas is 



/ 1 P dV = Nk B T\n ^ = (fas In 2)TAS. 

JVi *i 



(8) 



Where P is the pressure satisfying PV ~ Nk-^T and AS" 
is defined in Eq. (0). Both Eqs. and JHJ are identical 
to what we get from thermodynamics. The emergence of 
In 2 is because of the change in base from 2 to e. 

It seems logical enough to take this route to the defi- 
nition of entropy and it would remove much of the mist 
surrounding entropy in the beginning years of a physics 
student. 



III. COMPUTERS 

A. On computation 

For the computer problem mentioned in the Introduc- 
tion, one needs to ponder a bit about reality. In thermo- 
dynamics, one considers a reversible engine which may 
not be practical, may not even be implcmcntablc. But a 
reversible system without dissipation can always be jus- 
tified. Can one do so for computers? 



1. Reversible computers? 

To implement an algorithm (as given to it), one needs 
logic circuits consisting of say AND and NAND gates 
(all others can be built with these two) each of which 
requires two inputs (a,b) to give one output (c). By con- 
struction, such gates are irreversible: given c, one can not 
reconstruct a and b. However it is possible, at the cost 
of extra signals, to construct a reversible gate (called a 
Toffoli gate) that gives AND or NAND depending on a 
third extra signal. The truth table is given in Appendix 
lAl Reversibility is obvious. A computer based on such re- 
versible gates can run both ways and therefore, after the 
end of manipulations, can be run backwards because the 
hardware now allows that. Just like a reversible engine, 
we now have a reversible computer. All our references to 
computers will be to such reversible computers. 



2. Laws of computation 

Let us try to formulate a few basic principles applica- 
ble to computers. These are rephrased versions of laws 
familiar to us. 

Law I: It is not possible to have perpetual 
computation. 

In other words, we cannot have a computer that can read 
a set of instructions and carry out computations to give 
us the output without any energy requirement. Proving 
this is not straight forward but this is not inconsistent 
with our intuitive ideas. We won't pursue this. This type 
of computer may be called perpetual computer of type I. 
First law actually forbids such perpetual computers. 

Law II: It is not possible to have a computer 
whose sole purpose is to draw energy from a 
reversible source, execute the instructions to 
give the output and run backward to deliver 
the energy back to source, and yet leave the 
memory at the end in the original starting 
state. 

A computer that can actually do this will be called a 
perpetual computer of second kind or type II. 
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3. What generates heat? 

In order to see the importance of the second law, we 
need to consider various manipulations on a file (which is 
actually a string). Our interest is in long strings (length 
going to infinity as in thermodynamic limit in physics). 
Now suppose we want to edit the file and change one 
character, say, in the 21st position. We may then start 
with the original file and add an instruction to go to that 
position and change the character. As a result the edit 
operation is described by a programme which is almost 
of the same length (at least in the limit of long strings) 
as the original programme giving the string. Therefore 
there is no change in entropy in this editing process. Sup- 
pose we want to copy a file. We may attach the copy 
programme with the file. The copy programme itself is 
of small size. The copy process therefore again does not 
change the entropy. One may continue with all the pos- 
sible manipulations on a string and convince oneself that 
all (but one) can be performed at constant entropy. 

The exceptional process is delete or removal of a file. 
There is no need of elaboration that this is a vital pro- 
cess in any computation. When we remove a file, we are 
replacing the entire string by all zeros - a state with neg- 
ligible entropy. It is this process that would reduce the 
entropy by N for ./V characters so that in conventional 
units the heat produced at temperature T is Nk^T In 2 
(see Eq. (0 ) . We know from physics that entropy reduc- 
tion does not happen naturally (we cannot cool a system 
easily). 

4- Memory as fuel 

We can have a reversible computer that starts by tak- 
ing energy from a source to carry out the operations but 
to run it backward (via Toffoli gates) it has to store many 
redundant information in memory. Even though the pro- 
cesses are iso-entropic and can be reversed after getting 
the output to give back the energy to the source, we no 
longer have the memory in the same "blank" state we 
started with. To get back to that "blank" state, we have 
to clear the memory (remove the strings) . This last step 
lowers the entropy, a process that cannot be carried out 
without help from outside. If we do not want to clear 
the memory, the computer will stop working once the 
memory is full. 

This is the second law that prohibits perpetual com- 
puter of second kind. The similarity with thermodynamic 
rules is apparent. To complete the analogy, a computer 
is like an "engine" and memory is the fuel. From a prac- 
tical point of view, this loss of entropy is given out as 
heat (similar to latent heat on freezing of water). Lan- 
dauer in 1961 pointed out that the heat produced due to 
this loss of entropy is /cBTln2 per bit or Nk-QT\n2 for 
N bits. For comparison, one may note that A^fcsln2 is 
the total amount of entropy lost when an Ising fcrromag- 
net is cooled from a very high temperature paramagnetic 



phase to a very low temperature ferromagnetic phase. If 
the process of deletion on a computer occurs very fast in 
a very small region of space, this heat generation can cre- 
ate problem. It therefore puts a limit on miniaturization 
or speed of computation. Admittedly this limit is not 
too realistic because other real life processes would play 
major roles in determining speed and size of a computer. 
See Appendix IO for an estimate of heat generated. 

B. Communication 

1. The problem 

Let us now look at another aspect of computers namely 
transmission of strings (or files) or communication. This 
topic actually predates computers. To be concrete, let 
us consider a case where we want to transmit images 
discretized into small cells of four colours, RGBY with 
probabilities 

p(R) = l/2,p(G) = l/4,p(B) = p(Y) = 1/8. 

The question in communication is: "What is the minimal 
length of string (in bits) required to transmit any such 
image?" 

2. Kolmogorov and Shannon's theorem, 

There are two possible ways to answer this question. 
The first is given by the Kolmogorov entropy (= random- 
ness = complexity) while the second is given by a differ- 
ent powerful theorem called Shannon's noiseless coding 
theorem. Given a long string Cj of say ./V characters, if we 
know its Kolmogorov entropy Sj then that has to be the 
smallest size for that string. If we now consider all possi- 
ble N character strings with Vj as the probability of the 
jth string, then Sk = Ylj ^i^i ^ s the average number we 
are looking for. Unfortunately it is not possible to com- 
pute Sj for all cases. Here we get help from Shannon's 
theorem. The possibility of transmitting a signal that can 
be decoded uniquely is guaranteed with probability 1, if 
the average number of bits per character = — ^ pi log 2 pi 
where p,'s are the probabilities of individual characters. 
A proof of this theorem is given in Appendix [B] Since 
the two refer to the same object, they are the same with 
probability 1, i.e., 

Sk = - N ^ Pi log 2 pt. 
3. Examples 

The applicability of the Shannon theorem is now shown 
for the above example. To choose a coding scheme, we 
need to restrict ourselves to prefix codes (i.e. codes that 
do not use one code as the "prefix" of another code. As 
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an example, if we choose R = 0, G = 1, B = 10, Y = 11, 
decoding cannot be unique. E.g. what is 010? RGR 
or RB? Nonuniqueness here came from the fact that B 
(10) has the code of G (1) as the first string or prefix. A 
scheme which is prefix free is to be called a prefix code. 

For our original example, we may choose R = 0, G = 
10, B = 110, Y = 111 as a possible coding scheme to find 
that the average length required to transmit a colour is 

(I) = 1x^+2x^+2x3x^ = 1 (9) 

It is a simple exercise to show that any other method 
would only increase the average size. What is remarkable 
is that 

-^2pilog 2 pi = 7/4, 

i 

an expression we are familiar with from the Gibbs en- 
tropy and also see in the Shannon theorem. 

In case the source changes its pattern and starts send- 
ing signals with equal probability 

p(R)=p(G)=p(B)=p(Y) = l/4, 

we may adopt a different scheme with 

R = 00,G= 10,B = 01,Y= 11, 

for which the average length is 

(I) = 2 = - ^2pilog 2Pi . 

i 

This is less than what we would get if we stick to the first 
scheme. Such simple schemes may not work for arbitrary 
cases as, e.g., for 

p(R) = l/2,p(G) = 1 - 2e,p(B) = p(Y) = i + e. 

In the first scheme we get (I) = | + 2e while the second 
scheme would give (I) = 2. In the limit of e = 1/8, we 
can opt for a simpler code 

R = 0,B = 10,Y = 11, with (0=3/2. 

One way to reduce this length is then to make a list of 
all possible 2 NS strings, where S = — ^plog 2 p in some 
particular order and then transmit the item number of 
the message. This cannot require more than S bits per 
character. We see the importance of the Gibbs formula 
but it is called the Shannon entropy. 

4- Entropy 

It is to be noted that the Shannon theorem looks at the 
ensemble and not at each string independently. There- 
fore the Shannon entropy S = — ^jP^lnp^ is ensemble 
based, but as the examples of magnet or nonintcracting 



gas showed, this entropy can be used to get the entropy 
of individual strings. 

Given a set, like the colours in the above example, 
we can have different probability distributions for the el- 
ements. The Shannon entropy would be determined by 
that distribution. In the Kolmogorov case, we are assign- 
ing an "entropy" Sj to the jth long string or state but 
Sk is determined by the probabilities Vj's of the long 
strings which are in turn determined by the p's of the 
individual characters. Since both refer to the best com- 
pression on the average, they have to be equivalent. It 
should however be noted that this equivalence is only in 
the limit and is a probability 1 statement meaning that 
there are configurations which are almost not likely to 
occur and they are not counted in the Shannon entropy. 
Instead of the full list to represent all the configurations 
(as we did in Eqs. J3J and (J3J), it suffices to consider a 
smaller list consisting of the relevant or typical configu- 
rations. They are 2~ N ^ p 1 °SzP in number (see Appendix 
[5] for details), typically requiring S bits per character. 
A physical example may illustrate this. Even though all 
configuration of molecules in a gas are allowed and should 
be taken into account, it is known that not much harm 
is done by excluding those configurations where all the 
molecules are confined in a small volume in one corner of 
a room. In fact giving equal wcightage to all the configu- 
rations in Eq. Q is one of the sources of approximations 
of meanficld theory. 



IV. STATISTICAL MECHANICS 

We now try to argue that statistical mechanics can 
also be developed with the above entropy picture. To 
do so, we consider the conventional canonical ensemble, 
i.e., a system defined by a Hamiltonian or energy H in 
contact with a reservoir or bath with which it can ex- 
change only energy. In equilibrium, there is no net flow 
of energy from one to the other but there is exchange of 
energy going on so that our system goes through all the 
available states in phase space. This process is conven- 
tionally described by appropriate equations of motions 
but, though not done generally, one may think of the 
exchange as a communication problem. In equilibrium, 
the system is in all possible states with probability pi 
for the ith state and is always in communication with 
the reservoir about its configuration. The communica- 
tion is therefore a long string of the states of the system 
each occurring independently and identically distributed 
(that's the meaning of equilibrium). It seems natural to 
make the hypothesis that nature picks the optimal way 
of communication. We of course assume that the com- 
munication is noiseless. The approach to equilibrium is 
just the search for the optimal communication. While 
the approach process has a time dependence where the 
"time" complexity would play a role, it has no bearing in 
equilibrium and need not worry us. With that in mind, 
we may make the following postulates: 
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(1) In equilibrium, the energy (E) = YliPi^i 
remains constant. 

(2) The communication with the reservoir is 
optimal with entropy S = — J^Pi hip; . 

(3) For a given average energy, the entropy is 
maximum to minimize lailures in communi- 
cation. 

The third postulate actually assures that the maximum 
possible number of configurations (= 2 s ) arc taken into 
account in the communication process. No attempt has 
been made to see if these postulates can be further min- 
imized. 

With these sensible postulates, we have the problem of 
maximizing S with respect to piS keeping (i5)=constant 
and . pi = 1. A straight forward variational calculation 
shows that pi = ex-p(—/3Ei)/Z with Z = ^exp(— (3Ei) 
being the standard partition function. The parameter 
(3 is to be chosen properly such that one gets back the 
average energy. The usual arguments of statistical me- 
chanics can now be used to identify j3 with the inverse 
temperature of the reservoir. 



V. SUMMARY 

We have tried to show how the Kolmogorov approach 
to randomness may be fruitfully used to define entropy 
and also to formulate statistical mechanics. Once the 
equivalence with conventional approach is established, 
all calculations can then be done in the existing frame- 
work. What is gained is a conceptual framework which 
lends itself to exploitation in understanding basic issues 
of computations. This would not have been possible in 
the existing framework. This also opens up the possibil- 
ity of replacing "engines" by "computers" in teaching of 
thermodynamics. 
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APPENDIX A: TOFFOLI GATE 

The truth table of the Toffoli gate is given below. With 
three inputs a,b,c, the output in c' is the AND or NAND 
operation of a and b depending on c=0 or 1. 



a p a. 

b 1 

c | 

Fig. Al: Toffoli gate 



a b c 


a' b' c' 




1 

1 
1 1 




1 

1 
111 


1 

1 1 

1 1 
1 1 1 


1 

1 1 

1 1 
1 1 



APPENDIX B: PROOF OF SHANNON'S 
THEOREM 

The statement of Shannon's noiseless coding theorem 



is : 

If (I) is the minimal average code length of an 
optimal code, then 

S<(l)<S + l 

where S = Pj log 2 Pj- 

The adjective "noiseless" is meant to remind us that there 
is no error in communication. A more verbose statement 
would be 

If we use N(l) bits to represent strings of N 
characters with Shannon entropy S, then a 
reliable compression scheme exists if (I) > S. 
Conversely, if (I) < S, no compression scheme 
is reliable. 

The equivalence of the two statements can be seen by 
recognizing that S need not be an integer but (I) better 
be. 



1. Simple motivation 

Let us first go through a heuristic argument to mo- 
tivate Shannon's coding theorem. Suppose a source is 
emitting signals {cj} independently and identically dis- 
tributed with two possible values q = with probability 
Pi = p, and Cj = 1 with probability pi = 1 — p. For a 
long enough string C = C1C2C3C4...CN the probability is 



T(C) = p(c 1 )p(c 2 )p(c 3 )p(c 4 )...p(c A r) 
« p Np (l- P ) N ^ 

— 2~ Af [P 1 °S2 P+( 1_ P) log 2 (l-p)] 



(Bla) 
(Bib) 
(Blc) 



because for large N the number of expected is Np and 
1 is N(l—p). This expression shows that the probability 
of a long string is determined by 



S({Pj}) = -[plog 2 p+ (1 - p) log 2 (l - p)], 



(B2) 
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the "entropy" for this particular problem. Note the sub- 
tle change from Eq. (|Bla|) to Eq. (|Blb(l . This use of 
expectation values for large N led to the result that most 
of the strings, may be called the "typical" strings, belong 
to a subset of 2 NS strings (out of total 2 N strings). 

2. What is "Typical"? 

Let us define a typical string more precisely for any 
distribution. A string of N symbols C = CiC 2 C3C4...cjv 
will be called typical (or better e-typical) if 

2 -N(S+t) < p(c) < 2 -JV(S- e)j (B3) 

for any given e > 0. Eq. (|B3|) may also be rewritten as 
-e< [-N-HogzPiCj] -S<e (B4) 

3. How many typical strings? 

Now, for random variables Cj, X^s, defined by Xi = 
— log 2 p(cj), are also independent identically distributed 
random variables. It is then expected that X = J2i Xi, 
the average value of Xi's, averaged over the string for 
large TV, should approach the ensemble average, namely, 
(X) = — . pj log 2 Pj = S. This expectation comes from 
the law of large numbers that 

Prob[ iJV-^-logapfoJ-Sl < e]"-=T 1, (B5) 

i 

for any e > 0. This means that given an e we may find a 
5 > so that the above probability in Eq. IB5l is greater 
than 1 — S. Recognizing that 

£ log 2 p( Ci ) = log 2 Hp(ci) = log 2 V(C), (B6) 

i i 

Eq. (|B5|I implies 

Prob[ \N- l -\og 2 V{C) - S\ < e] >l-5. (B7) 

We conclude that the probability that a string is typical 
as defined in Eqs. (|B3|) and (|B4|) is 1 - S. 

Let us now try to estimate the number A/t yp , the total 
number of typical strings. Let us use a subscript \i for 
the typical strings with \i going from 1 to A/typ- The 
sum of probabilities V^'s of the typical strings has to be 
less than or equal to one, and using the definition of Eq. 
(IB3|I . we have one inequality 

1 > > J2 2~ N ^ = M typ 2~ N ^. (B8) 

This gives A/" typ < 2 N ^ s+e \ 



Let us now get a lower bound for At yp . We have just 
established that the probability for a string to be typical 
is 1 — S. Using the other limit from Eq. (|B3|I we have 

1 - S < < E 2 ~ W(5 ~ £) = Uy P 2- N(S - £ \ (B9) 

which gives A/" typ > {l-5)2 N ( s - e \ The final result is that 
the total number of typical strings satisfies 2 N ( s+e ^ > 
M yP > (1 - S)2 N ^~^ where 5 > can be chosen small 
for large N. Hence, in the limit 

Myp « 2 NS . (BIO) 

4. Coding scheme 

Now let us choose a coding scheme that requires Nl 
number of bits for the string of N characters. Our aim is 
to convert a string to a bit string and decode it - the whole 
process has to be unique. Representing the coding and 
decoding by "operators" C and T> respectively, and any 
string by (c|, what we want can be written in a familiar 
form 

(c\C\V = (c| for all (c|, 
cat myf ile|gzip|gunzip gives myf ile 

where the last line is the equivalent "pipeline" in a UNIX 
or GNU/Linux system. 

Let's take I > S. We may choose an e such that / > S+ 
e. It is a trivial result that A" typ < 2 N( - s+e ^> < 2 Nl . Here 
2 Nl is the total number of possible bit strings. Hence all 
the typical strings can be encoded. Nontypical strings 
occur very rarely but still they may be encoded. 

If I < S, then A/typ > 2 Nl and obviously all the typical 
strings cannot be encoded. Hence no coding is possible. 

This completes the proof of the theorem. 

APPENDIX C: HEAT GENERATED IN A CHIP 

As per a report of 1988, the energy dissipation 
per logic operation has gone down from 10 -3 joule 
in 1945 to 10~ 13 joule in 1980's. (Ref: R. W. 
Keyes, IBM J. Res. Devel. 32, 24 (1988) URL: 
|http:/ /www.research.ibm.com/journal/rd/441/keyes.pdfi 
For comparison, thermal energy at room tempera- 
ture is of the order of 10 -20 joule. 

If one can pack 10 18 logic gates in one cc operating 
at 1 gigahertz with minimal dissipation of fcgT, it would 
release 3 megawatts of energy. Can one cool that? 

A more recent example. For a pentium 4 at 1.6GHz, 
if the cpu fan (that cools the CPU) is kept off, then 
during operations the cpu temperature may reach 107C 
(yes Celsius) as monitored by standard system softwares 
on an HCL made PC (used for preparation of this paper). 



9 



[***] Following references have been extensively used 
here and should be consulted for details 

[1] A. N. Kolmogorov, Logical basis for information theory and 
probability theory, IEEE Trans. Information Theory, IT14, 
662 (1968). 

[2] Ming Li and Paul Vitanyi, An introduction to Kolmogorov 
Complexity and its applications (Springer- Verlag, New 
York, 1997). 

[3] M. A. Nielsen and I. L. Chuang, Quantum Computation 



and Quantum Information (Cambridge U Press, Cam- 
bridge, UK, 2000). 

[4] W. H. Zurek, Algorithmic randomness and physical entropy, 
Phys. Rev. A40, 4731 (1989). 

[5] C. H. Bennett, The thermodynamics of computation - a 
review, Int. J. Theo. Phys. 21, 905 (1982). 

[6] G. J. Chaitin, Randomness and mathematical proof, Sci. 
Am. 232, 47 (1975). 



